RNN-LDA Clustering for Feature Based DNN Adaptation
نویسندگان
چکیده
Model based deep neural network (DNN) adaptation approaches often require multi-pass decoding in test time. Input feature based DNN adaptation, for example, based on latent Dirichlet allocation (LDA) clustering, provide a more efficient alternative. In conventional LDA clustering, the transition and correlation between neighboring clusters is ignored. In order to address this issue, a recurrent neural network (RNN) based clustering scheme is proposed to learn both the standard LDA cluster labels and their natural correlation over time in this paper. In addition to directly using the resulting RNN-LDA as input features during DNN adaptation, a range of techniques were investigated to condition the DNN hidden layer parameters or activation outputs on the RNN-LDA features. On a DARPA Gale Mandarin Chinese broadcast speech transcription task, the proposed RNN-LDA cluster features adapted DNN system outperformed both the baseline un-adapted DNN system and conventional LDA features adapted DNN system by 8% relative on the most difficult Phoenix TV subset. Consistent improvements were also obtained after further combination with model based adaptation approaches.
منابع مشابه
Improved feature processing for deep neural networks
In this paper, we investigate alternative ways of processing MFCC-based features to use as the input to Deep Neural Networks (DNNs). Our baseline is a conventional feature pipeline that involves splicing the 13-dimensional front-end MFCCs across 9 frames, followed by applying LDA to reduce the dimension to 40 and then further decorrelation using MLLT. Confirming the results of other groups, we ...
متن کاملAn Environmental Feature Representation for Robust Speech Recognition and for Environment Identification
In this paper we investigate environment feature representations, which we refer to as e-vectors, that can be used for environment adaption in automatic speech recognition (ASR), and for environment identification. Inspired by the fact that ivectors in the total variability space capture both speaker and channel environment variability, our proposed e-vectors are extracted from i-vectors. Two e...
متن کاملGaussian Mixture Model and Deep Neural Network based Vehicle Detection and Classification
The exponential rise in the demand of vision based traffic surveillance systems have motivated academia-industries to develop optimal vehicle detection and classification scheme. In this paper, an adaptive learning rate based Gaussian mixture model (GMM) algorithm has been developed for background subtraction of multilane traffic data. Here, vehicle rear information and road dash-markings have ...
متن کاملLearning Factorized Transforms for Unsupervised Adaptation of LSTM-RNN Acoustic Models
Factorized Hidden Layer (FHL) adaptation has been proposed for speaker adaptation of deep neural network (DNN) based acoustic models. In FHL adaptation, a speaker-dependent (SD) transformation matrix and an SD bias are included in addition to the standard affine transformation. The SD transformation is a linear combination of rank-1 matrices whereas the SD bias is a linear combination of vector...
متن کاملTransferring knowledge from a RNN to a DNN
Deep Neural Network (DNN) acoustic models have yielded many state-of-the-art results in Automatic Speech Recognition (ASR) tasks. More recently, Recurrent Neural Network (RNN) models have been shown to outperform DNNs counterparts. However, state-of-the-art DNN and RNN models tend to be impractical to deploy on embedded systems with limited computational capacity. Traditionally, the approach fo...
متن کامل